85 research outputs found

    Taraldsen’s generalization in diachrony : evidence from a diachronic corpus

    Get PDF
    International audienceWe present the first large scale quantitative investigation of the syncretisation of verbal subject agreement in Medieval French and test a classic analysis which relates non-syncretic agreement and null subjects as parts of the same grammar (e.g. Rizzi 1986, Adams 1987, Alexiadou & Anagnostopoulou 1998, Roberts 2010, Sheehan to appear). We show that agreement syncretisation and the emergence of overt pronominal subjects proceeded at the same rate. On the Constant Rate Hypothesis of Kroch (1989), which states that a grammatical change has the same rate in different contexts, these results are compatible with the traditional analysis. However, we show that this analysis also generates a number of predictions which are not borne out by the quantitative data. We conclude that a more complex model of interaction of subject and inflection parameters is needed

    Expériences d'analyse syntaxique statistique du français

    Get PDF
    National audienceWe show that we can acquire satisfactory parsing results for French from data induced from the French Treebank using an unlexicalised parsing algorithm, that learns a probabilistic contex-free grammar with latent annotations. We investigate various instantiations of the treebank, in order to improve the performance of the learnt parser.Nous montrons qu'il est possible d'obtenir une analyse syntaxique statistique satisfaisante pour le français sur du corpus journalistique, à partir des données issues du French Treebank du laboratoire LLF, à l'aide d'un algorithme d'analyse non lexicalisé

    Improving generative statistical parsing with semi-supervised word clustering

    Get PDF
    short paper (4 pages)International audienceWe present a semi-supervised method to improve statistical parsing performance. We focus on the well-known problem of lexical data sparseness and present experiments of word clustering prior to parsing. We use a combination of lexicon-aided morphological clustering that preserves tagging ambiguity, and unsupervised word clustering, trained on a large unannotated corpus. We apply these clusterings to the French Treebank, and we train a parser with the PCFG-LA unlexicalized algorithm of Petrov et al. (2006). We find a gain in French parsing performance: from a baseline of F1=86.76% to F1=87.37% using morphological clustering, and up to F1=88.29% using further unsupervised clustering. This is the best known score for French probabilistic parsing. These preliminary results are encouraging for statistically parsing morphologically rich languages, and languages with small amount of annotated data

    Lexical Classes for structuring the lexicon of a TAG

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceThis paper presents work in progress on a system for structuring the lexicon of a Semantic Tree Adjoining Grammar for French. It focuses on an alternative to lexical rule based structuration of the lexicon, lexical classes: instead of deriving additional lexical structures by means of lexical rules, we show that we can enumerate in a compact way a whole lexicon by combining primitive lexical descriptions

    Multilingual discriminative lexicalized parsing

    Get PDF
    International audienceWe provide a generalization of discriminative lexicalized shift reduce parsing techniques for phrase structure grammar to a wide range of morphologically rich languages. The model is efficient and outperforms recent strong baselines on almost all languages considered. It takes advantage of a dependency based modelling of morphology and a shallow modelling of constituency boundaries

    Fréquence, longueur et préférences lexicales dans le choix de la position de l'adjectif épithète en français

    Get PDF
    International audienceL'article présente une étude de syntaxe quantitative sur l'alternance de position de l'adjectif dans le groupe nominal en français. Partant de l'hypothèse que les contraintes de placement de l'adjectif sont essentiellement des contraintes préférentielles, nous déployons une méthode de travail empirique reposant d'une part sur des données annotées en syntaxe et d'autre part une méthode d'inférence statistique pour caractériser formellement l'importance relative de contraintes qui interviennent dans ce phénomène. En nous intéressant aux principaux facteurs connus qui concernent l'item adjectival (classes lexicales auxquelles appartiennent les adjectifs, propriétés morphologiques, longueur, fréquence), nous montrons que le phénomène d'ordre entre l'adjectif et le nom repose en grande partie sur ces propriétés, et donc sur les caractéristiques de chaque item adjectival. Nous mettons aussi en avant l'importance des facteurs d'usage que sont la longueur et la fréquence. Enfin, notre travail apporte des éléments de méthode qui montrent qu'il est possible de modéliser le choix effectif de la place de l'adjectif, grâce à une approche probabiliste et des données annotées en syntaxe. Nous prenons également soin, dans une certaine mesure, d'identifier les limites pratiques que l'on rencontre pour mener à bien ce type d'étude sur le français à l'heure actuelle

    Multilingual discriminative lexicalized parsing

    Get PDF
    International audienceWe provide a generalization of discriminative lexicalized shift reduce parsing techniques for phrase structure grammar to a wide range of morphologically rich languages. The model is efficient and outperforms recent strong baselines on almost all languages considered. It takes advantage of a dependency based modelling of morphology and a shallow modelling of constituency boundaries

    Analyse syntaxique du français : des constituants aux dépendances

    Get PDF
    10 pagesInternational audienceThis paper describes a technique for both constituent and dependency parsing. Parsing proceeds by adding functional labels to the output of a constituent parser trained on the French Treebank in order to further extract typed dependencies. On the one hand we specify on formal and linguistic grounds the nature of the dependencies to output as well as the conversion algorithm from the French Treebank to this dependency representation. On the other hand, we describe a class of algorithms that allows to perform the automatic labeling of the functions from the output of a constituent based parser. We specifically focus on discriminative learning methods for functional labelling

    Boosting for Efficient Model Selection for Syntactic Parsing

    Get PDF
    International audienceWe present an efficient model selection method using boosting for transition-based constituency parsing. It is designed for exploring a high-dimensional search space, defined by a large set of feature templates, as for example is typically the case when parsing morphologically rich languages. Our method removes the need to manually define heuristic constraints, which are often imposed in current state-of-the-art selection methods. Our experiments for French show that the method is more efficient and is also capable of producing compact, state-of-the-art models

    Improving generative statistical parsing with semi-supervised word clustering

    Get PDF
    short paper (4 pages)International audienceWe present a semi-supervised method to improve statistical parsing performance. We focus on the well-known problem of lexical data sparseness and present experiments of word clustering prior to parsing. We use a combination of lexicon-aided morphological clustering that preserves tagging ambiguity, and unsupervised word clustering, trained on a large unannotated corpus. We apply these clusterings to the French Treebank, and we train a parser with the PCFG-LA unlexicalized algorithm of Petrov et al. (2006). We find a gain in French parsing performance: from a baseline of F1=86.76% to F1=87.37% using morphological clustering, and up to F1=88.29% using further unsupervised clustering. This is the best known score for French probabilistic parsing. These preliminary results are encouraging for statistically parsing morphologically rich languages, and languages with small amount of annotated data
    corecore